For the water quality analysis task, I will be using a dataset that contains data on all of the major factors that affect the potability of water. All of the factors that affect water quality are very important, so we need to briefly explore each feature of this dataset. Dataset: https://raw.githubusercontent.com/amankharwal/Website-data/master/water_potability.csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
data=pd.read_csv(r"C:\Users\$$$\Downloads\water_quality.csv")
data.head()
| ph | Hardness | Solids | Chloramines | Sulfate | Conductivity | Organic_carbon | Trihalomethanes | Turbidity | Potability | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | 204.890455 | 20791.318981 | 7.300212 | 368.516441 | 564.308654 | 10.379783 | 86.990970 | 2.963135 | 0 |
| 1 | 3.716080 | 129.422921 | 18630.057858 | 6.635246 | NaN | 592.885359 | 15.180013 | 56.329076 | 4.500656 | 0 |
| 2 | 8.099124 | 224.236259 | 19909.541732 | 9.275884 | NaN | 418.606213 | 16.868637 | 66.420093 | 3.055934 | 0 |
| 3 | 8.316766 | 214.373394 | 22018.417441 | 8.059332 | 356.886136 | 363.266516 | 18.436524 | 100.341674 | 4.628771 | 0 |
| 4 | 9.092223 | 181.101509 | 17978.986339 | 6.546600 | 310.135738 | 398.410813 | 11.558279 | 31.997993 | 4.075075 | 0 |
data.isnull().sum()
ph 491 Hardness 0 Solids 0 Chloramines 0 Sulfate 781 Conductivity 0 Organic_carbon 0 Trihalomethanes 162 Turbidity 0 Potability 0 dtype: int64
data=data.dropna()
data.isnull().sum()
ph 0 Hardness 0 Solids 0 Chloramines 0 Sulfate 0 Conductivity 0 Organic_carbon 0 Trihalomethanes 0 Turbidity 0 Potability 0 dtype: int64
plt.figure(figsize=(8, 5))
sns.countplot(x="Potability",data=data)
plt.title("Distrinution of Unsafe(0) and safe(1) Water")
Text(0.5, 1.0, 'Distrinution of Unsafe(0) and safe(1) Water')
import plotly.express as px
The ph column represents the ph value of the water which is an important factor in evaluating the acid-base balance of the water. Ph should between 6.5-8.5
fig=px.histogram(data,x='ph',
color='Potability',
title='factors affecting Water Quality: PH')
fig.show()
The hardness of water usually depends on its source, but water with a hardness of" 120-200 " milligrams is drinkable.
fig=px.histogram(data,x='Hardness',
color='Potability',
title='factors affecting Water Quality: Hardness')
fig.show()
All organic and inorganic minerals present in water are called dissolved solids. Water with a very high number of dissolved solids is highly mineralized. Now let’s take a look at the next factor affecting water quality:
fig=px.histogram(data,x='Solids',
color='Potability',
title='factors affecting Water Quality: Solids')
fig.show()
Chloramine and chlorine are disinfectants used in public water systems.
fig=px.histogram(data,x='Chloramines',
color='Potability',
title='factors affecting Water Quality: Chloramines')
fig.show()
They are substances naturally present in minerals, soil and rocks. Water containing less than 500 milligrams of sulfate is safe to drink.
fig=px.histogram(data,x="Sulfate",
color='Potability',
title="factors affecting water Quality: Sulfate")
fig.show()
Water is a good conductor of electricity, but the purest form of water is not a good conductor of electricity. Water with an electrical conductivity of less than 500 is drinkable.
figure = px.histogram(data, x = "Conductivity",
color = "Potability",
title= "Factors Affecting Water Quality: Conductivity")
figure.show()
Organic carbon comes from the breakdown of natural organic materials and synthetic sources. Water containing less than 25 milligrams of organic carbon is considered safe to drink.
figure = px.histogram(data, x = "Organic_carbon",
color = "Potability",
title= "Factors Affecting Water Quality: Organic Carbon")
figure.show()
THMs are chemicals found in chlorine-treated water. Water containing less than 80 milligrams of THMs is considered safe to drink.
figure = px.histogram(data, x = "Trihalomethanes",
color = "Potability",
title= "Factors Affecting Water Quality: Trihalomethanes")
figure.show()
The turbidity of water depends on the number of solids present in suspension. Water with a turbidity of fewer than 5 milligrams is considered drinkable.
figure = px.histogram(data, x = "Turbidity",
color = "Potability",
title= "Factors Affecting Water Quality: Turbidity")
figure.show()
The goal is to analyze the data.